Parallel K-Medoids++ Spatial Clustering Algorithm Based on MapReduce
نویسندگان
چکیده
Clustering analysis has received considerable attention in spatial data mining for several years. With the rapid development of the geospatial information technologies, the size of spatial information data is growing exponentially which makes clustering massive spatial data a challenging task. In order to improve the efficiency of spatial clustering for large scale data, many researchers proposed several efficient clustering algorithms in parallel. In this paper, a new K-Medoids++ spatial clustering algorithm based on MapReduce for clustering massive spatial data is proposed. The initialization algorithm to decrease the number of iterations is combined with the MapReduce framework. Comparative Experiments conducted over different dataset and different number of nodes indicate that the proposed K-Medoids spatial clustering algorithm provides better efficiency than traditional K-Medoids and scales well while processing massive spatial data on commodity hardware.
منابع مشابه
A Rank-Based K-medoids Clustering Algorithm by a Specific P System
In this paper, a rank-based K-medoids algorithm by a specific P system is proposed, which exhibits novel aspect of applying membrane computing in clustering. The traditional K-medoids clustering result suffers sensitivity to initial medoids selected randomly. To conquer the defect, we introduce the rank based on similarity between pairs of objects for the initialization. As a biological computi...
متن کاملParallel K-Means Clustering Based on MapReduce
Data clustering has been received considerable attention in many applications, such as data mining, document retrieval, image segmentation and pattern classification. The enlarging volumes of information emerging by the progress of technology, makes clustering of very large scale of data a challenging task. In order to deal with the problem, many researchers try to design efficient parallel clu...
متن کاملParallel Multi-Swarm PSO Based on K-Medoids and Uniform Design
PAM (Partitioning around Medoid) is introduced to divide the swarm into several different subpopulations. PAM is one of k-medoids clustering algorithms based on partitioning methods. It attempts to divide n objects into k partitions. This algorithm overcomes the drawbacks of being sensitive to the initial partitions in kmeans algorithm. In the parallel PSO algorithms, the swarm needs to be divi...
متن کاملParallelising the k-Medoids Clustering Problem Using Space-Partitioning
The k-medoids problem is a combinatorial optimisation problem with multiples applications in Resource Allocation, Mobile Computing, Sensor Networks and Telecommunications. Real instances of this problem involve hundreds of thousands of points and thousands of medoids. Despite the proliferation of parallel architectures, this problem has been mostly tackled using sequential approaches. In this p...
متن کاملMapReduce K-Means based Co-Clustering Approach for Web Page Recommendation System
Co-clustering is one of the data mining techniques used for web usage mining. Co-clustering Web log data is the process of simultaneous categorization of both users and pages. It is used to extract the users’ information based on subset of pages. Nowadays, the cyberspace is filled with huge volume of data distributed across the world. The business knowledge acquaintance from such a voluminous d...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- CoRR
دوره abs/1608.06861 شماره
صفحات -
تاریخ انتشار 2016